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This article reviews the work of K. Yao and C. Y. Chang on the application of systolic 
priority queues to the sequential stack decoding algorithm. Using a systolic array architec- 
ture , one can significantly improve the performance of such algorithms at high signal-to- 
noise ratio. However ; their applicability at low SNR is doubtful 


I. Introduction 

An active area of current research on deep space communi- 
cations is the development of codes usable at low signal-to- 
noise ratio. The requirements for such codes are low error 
probability and the practicality of the decoding algorithm. 
It is well known that the error probability of convolutional 
codes decreases with their constraint lengths. However, the 
complexity of the Viterbi algorithm, which is the standard 
method for decoding convolutional codes at low SNR, has 
exponential growth with the code’s constraint length. In a 
report on research conducted under a JPL contract, Yao and 
Chang suggest that, by using a systolic array architecture, 
decoding procedures for long constraint length codes can be 
practically implemented. The viability of the sequential stack 
algorithm (using systolic arrays) as an alternative to Viterbi’s 
method is the main contention of these authors. This article 
reports on the scope and limitations of their approach. A seri- 
ous limitation of their approach is that it may not be useful at 
the low signal-to-noise ratio for deep space communication. 

II. Stack Algorithm 

The encoding procedure for a convolutional code can be 
regarded as a route through the code tree in the usual man- 
ner. A received symbol sequence is then a path in the code 


tree. To every path, x , is associated a real number mix) calle 
the Fano metric. Figure 1 is a schematic description of th 
stack algorithm for decoding (see [1] for further details). 

The Fano metric is constructed with the maximum like! 
hood criterion in mind. In fact, it is a generally accepte 
theorem that the stack algorithm is a good approximate 
to maximum likelihood decoding at high SNR. 

The simplicity of the stack algorithm, as compared t 
Viterbi’s, is reflected in the design of the hardware for th 
decoder. The wiring problem for the Viterbi decoder become 
extremely complicated for convolutional codes of constrain 
length >15, while the same problem remains fairly simple fc 
the stack algorithm. Exponential growth of the layout are 
with the constraint length is another serious problem fc 
VLSI design of large constraint length Viterbi decoders. Fc 
the stack algorithm, the growth of the layout area is onl 
linear with the constraint length. 

In spite of these advantages, the stack algorithm hi 
remained unpopular for several reasons. Most notable are: 

(1) The reordering of paths according to the metric requin 
a large memory and is very time consuming. This ofte 
leads to overflow of the buffer and erasures. 
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(2) The performance of the stack algorithm at low SNR is 
considerably inferior to that of the Viterbi algorithm. 

For low data rate and/or two-way communications, era- 
sures (frame deletions) are not a serious problem. However, for 
application to deep space communication, this effect presents 
a major obstacle. This problem will be discussed in more detail 
in section III. 

In their articles on the application of systolic priority 
queues to sequential decoding ([2] and private communica- 
tion contained in a report “Systolic Array Processing for 
Stacking Algorithms” which was submitted to Jet Propulsion 
Laboratory, Pasadena, California as a Second Progress Re- 
port), Chang and Yao note that a complete reordering of 
paths is not necessary for the implementation of the stack 
algorithm. In fact, the choice of the best path, i.e., the path 
with the largest Fano metric, is the only requirement and 
this can be accomplished efficiently by an application of 
systolic priority queues,. Unfortunately, no quantitative 
measure of the improvement in efficiency is provided by 
Chang and Yao, 

Roughly speaking, systolic arrays, as introduced by Kung 
[3] and applied by Leiserson to priority queues [4] , are a 
form of parallel processing that has found many applications 
in VLSI design. Several designs for systolic queues for the 
determination of the best path are available, viz., random 
access memory (RAM), shift register scheme (SRS), and 
ripple register scheme (RRS). A general problem faced by 
many parallel processing schemes is the necessity of insertion 
of global controls for proper synchronization of the system. 
Chang and Yao recommend the RRS since it does not require 
global controls and maintains the local communication pro- 
perty (private communication contained in a report “Systolic 
Array Processing for Stacking Algorithms” which was submit- 
ted to Jet Propulsion Laboratory, Pasadena, California as a 
Second Progress Report). 

It is also noted by Chang and Yao [5] that the Viterbi 
algorithm can be regarded as a matrix-vector multiplication 
(here, matrix entries are from an algebra where multiplica- 
tion is defined as taking minimum). Therefore, this algorithm 
lends itself to parallel processing, and systolic priority queues 
can be used for improvement of the Viterbi decoder. The 
idea of using parallel processing for VLSI design of the Viterbi 
decoder is, of course, not new, and substantial work 
has already been done in this area by researchers at JPL 
([6], [7], and [8]). 

III. Performance Statistics 

It is clear from the description of the stack algorithm 
that the number of computations necessary to advance one 


node in the code tree is a random variable N. At low signal- 
to-noise ratio, one encounters situations where backtracking 
is necessary and this effectively increases the mean of the 
random variable N. The behavior of N has been studied by a 
number of workers in coding theory. In particular, Jacobs and 
Berlekamp [9] obtained a lower bound for N. They showed 
that for fixed error or erasure probability, the distribution of 
N satisfies the following bound: 

P(N>t)>r a (i+o(t)) 

It is important to note that this bound depends only on the 
channel error probability and the code rate and is independent 
of the choice of the method for selection of the best path. The 
code rate R and the exponent a are functionally dependent. 
As ot tends to 1 from below, the mean of N approaches infin- 
ity. The value R 0 of R corresponding to ot = 1 is called the 
cut-off rate. Sequential decoding for R > R 0 is practically 
impossible, since the number of computations becomes 
exceedingly large. 

In “A Simulation Study for the Stack Algorithm for Low 
SNR” (a preprint article submitted to Jet Propulsion Labora- 
tory, Pasadena, California), Chang reports on his simulation 
results on stack decoding for a (24, 1/4) convolutional code 
when SNR = E^/Nq is between 0.9 and 1.3 dB. As pointed 
out by the author himself, 7^ = 1/4 is greater than the cutoff 
rate. Therefore, for a priori reasons, one cannot draw any opti- 
mistic conclusions about the performance of stack decoding 
on the basis of this work. Moreover, a comparison of this data 
and those of S. Z. Kalson (JPL Internal Document, Memo 
331-86.2-217, November 6, 1986), for a (15, 1/5) convolu- 
tional code, shows that the performance of the stack decoder 
at SNR = 0.9 dB is comparable to that of the Viterbi decoder 
at SNR = 0.4 dB. This comparison of the Viterbi and stack 
algorithms did not take into account the overhead due to the 
short frame length (= 100 bits) adopted by Chang. The loss 
due to the overhead for a marker of length X and frame length 
L is 10 log(l + X/Z,), where log is taken to base 10. Thus, for a 
32-bit marker, the loss is about 1.2 dB. 

Some of the key parameters chosen by Chang for his study 
appear to be unrealistic. As pointed out earlier, the code rate 
1/4 and the frame length 100 are hardly acceptable choices 
for these parameters. Chang also gives no indication of the 
nature of the (24, 1/4) code he is using for his simulation. 
Special attention must be paid in the choice of the code, since 
different codes of the same constraint length and rate perform 
differently under sequential decoding. A discussion of what 
constitutes a “good” code for sequential decoding appears 
in [10] . It is also clear that the buffer size affects the error 
probability and the performance of the stack algorithm. For 
a conclusive study of the possibilities of the stack algorithm 
at low SNR, the following points should be kept in mind: 
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(1) Experiment with several, much longer (>1000) frame 
lengths and lower rate codes. This would substantially 
reduce the overhead and clarify the dependence of 
the error probability (mainly frame deletion) on the 
frame length. 

(2) Make sure the chosen code is a “good” one. 

(3) Quantify the dependence of error probability on the 
buffer size and the computation time. (The latter 
point is addressed by Chang.) 


(4) The effect of using soft decision on the performance 
of the stack algorithm should be clarified. 

IV. Conclusion 

The application of systolic array architecture, as suggested 
by Chang and Yao, is a significant improvement in the sequen- 
tial stack decoding techniques at high signal-to-noise ratio. 
However, it is unlikely that the stack algorithm can serve as 
a viable alternative to the Viterbi algorithm at low SNR. 
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Fig. 1. Flow chart for stack decoding 






